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1.  INTRODUCTION 


Let  { y ( t ) ,  teZ}  be  a  zero  mean,  covariance  stationary,  discrete  time 
series  with  autocovar iance  function  {R(v),  veZ}.  Then  in:.  I)  finite  order 
autoregressive  modeling  (Akaike  (1969)),  2)  spectral  density  estimation 

using  autoregressive  approximants  (Parzen  (197*0),  3)  maximum  entropy 

spectral  analysis  (Ulrych  and  Bishop  (197*0),  and  *0  explicit  minimum  mean 
squared  error  linear  prediction  (see  Anderson  (1971),  p.  419),  one  is  faced 
with  estimating  the  quantities  a^OKc^,  l<j£k<M,  satisfying  the  Yule-Walker 
equat ions 

k 

l  a.  (j)  R(j-v)  =  6  a?  ,  v>0  ,  (1.1) 

j-0  k  v  K 

where  a^(0)  *  *  and  5V  <s  The  Kronecker  delta. 

Let  y(l ) , . . . ,y(T)  be  a  mean  detrended  sample  realization  from  y.  The 
traditional  method  for  finding  estimators  a^(j)  and  a£  of  ot^(j)  and  has 
been  to  estimate  R(0) , . . . ,R(M)  by  the  positive  definite  sample  autocovariances 

,  T;|v| 

R(v)  J  y(t)y(t+|v|)  ,  | v |  £  M, 

t=) 

and  then  to  use  (1.1)  with  R(v)  replacing  R(v) .  These  Yule-Walker  estimators 
have  two  main  attractions;  I)  they  can  be  calculated  very  quickly  by 
Levinson's  (19**7)  algorithm 

£,(1)  -  -R(l)/R(0)  ,  af  -  R(0)  n-af  (1 )] 


(1.2) 
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k-1 

ak00  “  -Ho  <Vl  0 ) R(k-j) l/a2_i  , 

ok(J)  “  ak_j  (j)  +  ak(k)ak_ j (k-j) ,  j**l . k-1,  (1.3) 

°k  "  °k-l  ll'“k(k)1  * 

k»2,...,M,  and  2)  they  are  guaranteed  to  lead  to  stable  prediction  filters 

and  positive  spectral  density  estimators  since  (Parzen  (1961))  the  zeros  of 

k  .  1 

the  complex  polynomials  gk(z)  *  £  ak(j)zJ  are  all  outside  the  unit  circle. 

v  J=0 

If  we  define  the  (k  x  k)  Toeplitz  matrix  to  have  (i,j)th  element 
R  ( 1 1  -  j  | )  and  the  (k  x  1)  vector  r^  to  have  i^  element  R(i),  we  can  write 
the  sample  Yule-Walker  equations  for  v=l,...,k  as 


rk  “k 


(1.4) 


where  oik  *  (ak(l ) , . . .  ,ak(k))  .  Then  (1.4)  are  the  normal  equations  for 
the  ordinary  least  squares  estimators  ak  in  the  linear  model 

yk  *  _xkak  +  e  ♦  where 
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"  y(l) 

•  Xk  * 

0  0  ...  0  "j 

y(2) 

o 

o 

>* 

y(3) 

y (2)  y(l)  ...  0 

y(k+l ) 

y(k)  y(k-l)  ...  y(l) 

•  • 

y(T) 

•  • 

y(T-l)  y (T-2)  ...  y(T-k) 

0 

y(T)  y(T-l)  ...  y (T-k+1 ) 

0 

0  y(T)  ...  y (T-k+2) 

•  •  • 

.  i  . 

0  0  ...  y(T) 

are  (T+k)  x~  1  and  (T+k)  x  k  respectively.  Thus  the  Yuie-Walker  estimators 

correspond  to  the  ordinary  least  squares  regression  of  y(t)  on  y(t-l) . y(t-k) 

for  t*=l , . . . ,T+k;  with  any  unobserved  y  replaced  by  its  expected  value  of  zero. 

Thus  the  Yule-Walker  estimators  suffer  numerically  by  being  the  solution 
to  normal  equations  for  a  linear  model  rather  than  being  the  result  of 
applying  a  tr iangularizat ion  method  (see  Lawson  and  Hanson  (197*0,  for  example) 
to  the  design  matrix  X^.  This  can  be  particularly  important  in  time  series 
because  of  ill-conditioning  caused  by  the  peculiar  form  of  X^.  In  addition, 
if  the  variance  of  y  is  large,  then  the  effect  of  replacing  the  observations 

y(l-k) ,. . . ,y(0)  and  y(T+l) . y(T+k)  that  are  on  the  ends  of  the  observed 

data  by  zeros  raises  the  possibility  of  obtaining  poor  estimators. 

There  have  been  three  main  attempts  to  alleviate  the  problems  in  Yule- 
Walker  estimators.  First,  are  what  are  often  called  least  squares  estimators 
(see  Anderson  (1971),  p.  211).  These  are  obtained  by  dropping  the  first  and 
last  k  rows  of  y^  and  X^  and  then  performing  ordinary  least  squares  regression. 


These  estimators  seem  to  reduce  end  effects  but  cannot  be  obtained  recursively 
for  and  also  are  not  guaranteed  to  lead  to  stable  filters.  The 

second  method  is  the  Burg  (1968)  algorithm  which  seems  to  moderate  end  effects, 
is  recursive  in  nature,  and  leads  to  stable  filters.  However,  it  appears  to 
be  somewhat  ad  hoc  in  nature  and  its  numerical  properties  are  difficult  to 
describe.  The  third  method,  due  to  Pagano  (1972),  consists  of  solving  the 
normal  equations  (1.4)  via  the  numerically  stable  modified  Cholesky 
tr iangular izat ion  (Wilkinson  (1967))  of  r^. 

The  purpose  of  this  paper  is  to  show  how  the  factors  needed  to  find  the 
modified  Gram-Schmidt  tr iangularization  of  Xu  corresponding  to  Pagano's 

n 

triangularization  of  Tu  can  be  found  recursively  while  only  needing  to  store 

n 

two  (T+M)  x  1  vectors  (rather  than  storing  the  entire  Xu  matrix).  From 
this  we  find  a  design-matrix  oriented  algorithm,  called  the  Toeplitz 
Gram-Schmidt  algorithm,  for  finding  the  Yule-Walker  estimators.  We  show 
that  the  Toeplitz  Gram-Schmidt  algorithm  is  roughly  as  fast  as  using 
Levinson's  algorithm,  while  providing  a  unified  framework  for  deriving  other 
stable,  more  robust  estimators.  The  algorithm  allows  calculation  of 
diagnostics  useful  in  measuring  end  effects  as  well  as  determining  if 
particular  observed  values  of  y  are  exerting  undue  influence  on  the 
estimation  procedure. 

The  algorithm  is  given  in  section  2,  its  numerical  properties  described 
in  section  3,  and  its  application  for  robust  autoregression  and  autoregression 
diagnostics  discussed  in  section  4. 
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2.  THE  TOEPLITZ  GRAM- SCHMIDT  YULE-WALKER  ALGORITHM 

We  seek  ordinary  least  squares  estimators  of  a^.  k=l,...,M,  for  the 
linear  models  yk  =  -Xkak  +  e  without  using  the  normal  equations. 

By  Levinson's  algorithm  we  can  obtain  all  the  a^(j)  and  from 

only  R(0)  and  a^(k),  k=l,...,M,  by 

of  -  R(0)  tl-S|(l)]  (2.1) 

«k(j)  =ak_j(j)  +  ak(k)ak_| (k-j)  ,  j*l,...,k-l,  (2.2) 

o^=  ak_|  [l-o^(k)],  (2.3) 

k=2, . . . ,M.  Further  if  one  only  wants  to  estimate  ojVR(O),  as  is  often 
the  case,  then  only  a^(k),  k*l,...,M  are  required. 

Thus  the  algorithm  we  propose  consists  of  determining  recursively 
ak(k),  k»l,...,M,  by  calculating  at  the  kth  step  the  kth  and  (M+!)st 
columns  of  the  kth  stage  of  the  modified  Gram-Schmidt  decomposition 
(Bjorck  (1967)),  MGSD ,  of  the  (T+M)  x  (M+l)  matrix  [X  *, y^) .  Then  these 
ak(k)'s  are  used  in  (2 . 1 ) — (2 . 3)  to  find  all  the  desired  estimators. 

Theorem  2.1  shows  how  the  ak(k)'s  are  calculated  while  Theorem  3>l  shows 
that  the  algorithm  is  indeed  an  application  of  the  MGSD. 

Define  the  lag  operator  L  on  the  (n  x  1)  vector  a  ■  (a ( 1 ) . a(n))T 

to  be  L (a)  -  (0,  a ( 1 ) ,. . . ,a(n-l))^.  Then  we  note  that  the  columns  of 
are  Xj,...,xM  where  Xj  -  L(yM)  an<1  Xj  *  J"2,...»M. 


ife®..,. 
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Xj  ,  and  for  k  =  1 . M: 


!(k+1) 

* 

!(k)  + 

a(k)  x(k) 

*(k+l ) 

= 

L(?(k) 

+  a (k)y 

(k)}  • 

Then 

a) 

a(k)  = 

“k 

(k) 

,  k= 

1  , . . . ,M. 

k-i 

b) 

?(k)  = 

In 

•-I 

+ 

Xj  ,  k«l,...,M+l, 

k-i 

c) 

_(k)  " 

?k 

-  I 
1=1 

Vi® 

x.  .,  k=l,.,.,M+l, 

«x- J 

where  any  summation  is  zero  if  its  upper  limit  is  less  than  its  lower  limit. 
Proof 

We  proceed  by  induction  on  k.  For  k*l,  (b)  and  (c)  are  true  by 
definition,  while  a(1)  *  -TR(1)/TR(0)  *  ctj(l).  For  k>l,  we  have  by  the 
induction  hypothesis. 


Theorem  2. 1 


LCt  ?(»)  *  ?M’  *(l)  =  L(!m} 


a(k) 


M  loo 

!(k)  ?(k) 


*(w) -  Ll^(k-i) +  r(k-DJ 


•s ir' 
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k-2 

L^(k-1)+j£,  ak-2^*k-1-J^+°k-1  *k_1)  L^VjI1ak-2^?JI 

k-2  a  k-2 

-k+jl,  0lk-2^^k-j+ak-l  ^k-^  ^-l+j£,  ak-2^*j^ 


yVl(k',)!!)+j1lV2(j)+Vl(k*,)V2(k'l*j)1  xk-j 

j  =  !  -  J 


\  +  l  Vi(j)xk-j 

J-1 


by  (1.3).  Similarly, 


!(k) =  [?m  +  J.  V2(j)?j}+vi(k",)[?k-i  *l  V2(J)?k-i-j] 

j=*i  j  j*i  -* 

k-2 

*  +  ak-l  *k-1^k-l  +  j^/ak-2^^+°k-l  (k"1)ak-2^k_1“j^?j 


rv.  • 

'  ?H  *  J,  Vlljl  ?j 

J  =  1  J 


We  have  thus  shown  (a)-(c)  being  true  for  k-1  implies  (b)  and  (c)  are 
true  for  k  and  it  remains  to  prove  (a).  To  do  so  we  need  several  well-known 

A 

facts  about  the  modified  Cholesky  decomposition  of  Tu  and  the  Gram-Schmidt 

n 

decomposition  of  X^. 

Let  XM  -  Q^R^  be  the  Gram-Schmidt  decomposition  of  XM,  i  .e  R^is  an 
(M  x  M)  unit  upper  triangular  matrix  and  is  a  (T  +  M)  x  M  orthogonal 
matrix  having  columns  q1,...,qM,  i .e  ■  DM,  a  diagonal  matrix.  Let 

A  *  A  a  J  A  A 

rM  •  LU0ULU  be  the  modified  Cholesky  decomposition  of  ru,  i.e  Lu  is  an 
n  nun  n  —  n 


(M  x  M)  unit  lower  triangular  matrix  and  DM  Is  am  (M  x  M)  diagonal  matrix. 


Then  ?M  -  j  X^  -  -  y  rJd^,  which  by  the  uniqueness  of  the  modified 

Cholesky  decomposition  (Wilkinson  (1967)),  gives  »  R^  and  6^  ■  y  D^. 
Further  (Pagano  (1972)),  with  ■  R(0),  we  have 


cmi  ■  r  ' 


a,0)  I 


a* (2)  S2(l)  1 


(2.4) 


a^_] (M“ ' )®m-1 • • -aM-l ^ ^  ' 


Dm  *  Diag  (a^.o | , . . .  ,oj*_  j )  , 

and,  defining  0M  *  (a^ ( 1 ) , . . . ,aM(M) )  , 


?M  "  -^mV'  <&  ?m 


“k(k) 


9k 


k*l , . . . ,M. 


(2.5) 


Since  XM  »  Q^Rj^  we  have  Q^XM  -  “  DMRM»  an  uPPer  triangular  matrix, 

which  gives 
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1  \T  , 


Finally,  -  X^-l  -  XM(CH' )  ,  by  (2.4), 


?k  '  3c  *  .1.  “k-i(j)  Vi  • 

J  =  1 


which  we  have  shown  under  the  induction  hypothesis  is 


Thus , 


T  T  k-1 

*(k)?(k)  <k[?M  + 


-(k)^(k) 


%  ?k 


%  % 


ak(k) 


by  (b)  and  (2.5). 


Corol lary  2.1 


The  vectors  y^j  and  are  the  forward  and  backward  prediction 

errors  one  would  obtain  using  a  memory  (k-1)  linear  prediction  filter 
assuming  y(l-k)  =  —  =y(0)=y(T+l)*. .  ,=y(T+k)=0,  i  .e  if  we  denote  by  yj^j 
and  the  elements  of  y^  and  ,  then  for  j=l,...,T, 


k-1 

Ik) =  ■ 

1 

<  o 

U)y(J-i) 

k-1 

!(k)  *  - 

l  V, 

JL=1 

(&)y (j+A) 

is  the 

value  of 

a  -  (a  ( 1 ) 

I  T 
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T  k-1 

SP(a)  =  l  { y ( j )  +  l  a(Oy(J-£)}2 

j=l  1=1 


and 


T  •  k-1 

$B(a)  *  l  {y  ( j )  +  l  aU)y(j  +  Jt)}2 
j=l  1=1 


assuming  y(l-k)=. . .=y(0)=y (T+l )=. . .=y (T+k)=0.  Further, 


?(k)?(k) 


=  SB(“k-l) 


sf(vi}  =  rlk)?(k) 


(2.6) 


k=l,...,M+l. 

The  Toeplitz  Gram-Schmidt  Yule-Walker  algorithm  for  finding 
( j )  •  °j<  >  consists  then  of  the  following  steps: 

'>  »<1)  '  '  R(0) 
i i )  for  k=l , . . . ,M: 


ak(k)  - 


I(k)l(k) 

~(k)?(k) 


al  =  ak-l 


Y(k+i)  =  U k)  +  ak(k)?(k) 
;<k+o  =  L(?(k) +  ak(k)y(k)) 


iii)  use  (2.1),  (2.2)  to  obtain  the  other  a^(j)*s. 


(2.7) 

(2.8) 
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3.  NUMERICAL  PROPERTIES  OF  THE  ALGORITHM 

3.1  Arithmetic  Operations  and  Required  Storage 

To  obtain  9U  there  are  2(M-I)  updates  of  the  form  a  -  cb  for  (T+M)  x  1 
vectors  a  and  b  and  2K  inner  products  of  (T+M)  x  1  vectors.  Counting  a 
multiplication  and  addition  as  an  operation,  there  are  thus  (T+M)(4M-2) 
operations  to  obtain  0M- 

To  use  Levinson's  algorithm,  R(0) , . . . ,ft(M)  must  first  be  calculated, 
which  takes  approximately  TM  operations  if  inner  products  are  used  or 
2T  log2  T  operations  if  the  Fast  Fourier  Transform  is  used. 

In  terms  of  speed  then  the  algorithm  of  Theorem  2.1  is  slightly  slower 
than  using  Levinson's  algorithm.  Neither  of  the  algorithms  requires 
significant  storage;  approximately  2  (T+M)  locations  for  the  Toeplitz 
Gram-Schmidt  algorithm  versus  T+M  for  Levinson's  algorithm  when  the  data 
vector  is  stored. 

3.2  Error  Analysis 

To  determine  the  numerical  accuracy  of  the  proposed  algorithm  we  shall 

first  prove  that  it  is  a  method  of  finding  the  matrix  Q  and  the  last  column 

of  R  in  the  MGSD  of  [Xu*.yu]  using  only  two  vectors  of  length  T+M. 

n  «n 

Let  A  be  a  full  rank  (n  x  p)  matrix  having  columns  a^,...,a^.  The 

MGSD  algorithm  is  a  numerically  stable  (Bjorck  (1967))  method  of  determining 

the  Gram-Schmidt  factors  Q=(qj,...,q  )  and  R  of  A  in  p  stages.  At  the  k**1 

stage,  the  k**1  column  of  Q  and  the  k^  row  of  R  are  determined.  Conceptually, 

one  defines  the  sequence  of  matrices  ■  A,  ^ , . . .  ,Q^  ■  Q,  with 

(k)  (k) 

having  columns  q j  {....q^  ,  by: 
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(k) 


f 


where  R^j  is  the  ratio  of  inner  products 


j<k 


J  >k 


Rl. 

kj 


(?jk  l>)T 
(,<»)T  9<k> 


Then  q^  *  and  one  can  actually  accumulate  ^  , . . .  ,Q^  in  a 

single  matrix. 

We  have 


Theorem  3- 1 

The  vectors  x^  and  y^  in  Theorem  2.1  are  the  kth  and  (M+l)s* 


co 


lumns  of  in  the  MGSD  of  matrix  [Xu:yuJ. 

n 


Proof 


Parts  (b)  and  (c)  of  Theorem  2.1  show  that  for  k>1  f  y^^  and  Xjkj  and 
the  ordinary  least  squares  residuals  of  the  regression  of  yu  on 


'  ?i 


i — »_xk-i  an<*  \  on  _xi  respectively  and  thus  (see  Clayton 


(1971))  they  are  q^j  and  q^  respectively.  For  k=l, 


M2 


?2  ?l 

T 

?l  ?l 


rfi  ?i 

T 

?1  ?1 


R1,M+1  “  al(1)' 


iUS 


13 


Since  the  Toeplitz  Gram-Schmidt  algorithm  is  a  subset  of  the  MGSD  we 
can  use  the  results  of  Bjorck  (1967).  We  use  the  techniques  and  notation 
introduced  by  Wilkinson  (1965). 

Using  equation  (5.8)  from  Bjorck  (1967)  and  assuming  that  all  inner 
products  are  accumulated  in  double  precision,  we  have 

||fl(y(M))  -  y(M)  ||  <  1-9  (M+l)2_t  /EToj  ,  (3.1) 

where  2  1  is  the  machine  precision  of  the  computer  being  used  to  perform 
the  calculations.  This  proves  that  the  Toeplitz  Gram-Schmidt  algorithm 
gives  a  numerically  stable  prewhitening  filter.  By  symmetry,  a  formula 
identical  to  (3.1)  would  apply  for  x^. 

The  roundoff  error  for  ak(k)  can  be  obtained  by  letting  <*k(k)  be 
the  exact  solution  (no  rounding  errors)  using  the  inexact  fl(x(k))  and 
fl (y(k))  so  that 


I f  1  (“k (k) )  "  ak(k)  li.|f  1  («k(k))  -ctk(k)| 

+  |  ak(k)  "  I 

Using  a  perturbation  bound  on  and  equation  (4.11)  in  Bjorck,  one  obtains 

|fl(ak(k))|-ik(k)|<(2.0l|ik(k)|  +  .01  a*)  2_t , 

|ak(k)  -  ak(k)|<  8k^  (k+1)  /r(0)  2_t  , 


b-  il _ 


and  hence 


|fl(ak(k))  -ak(k)|<(2.0l|ak(k)|  +  .0l  £2+8k*(k+1)  /rW  )2_t  . 

From  this  it  is  concluded  that  the  algorithm  is  stable. 

To  illustrate  the  numerical  accuracy  of  the  algorithm,  we  use  the 

example  of  a  constant  y(t)  (see  Davis  and  Qualls  (1977)).  For  T=1000 

and  M=20,  has  condition  number  K(fu)  =  505-  The  results  of  using 

n  n 

Levinson's  algorithm  and  the  Toeplitz  Gram-Schmidt  algorithm  (multiplied 
by  2*  to  make  them  reasonably  machine  independent)  are  presented  in 
Table  3.1. 

Table  3.1 

Levinson  1168  505 

Toeplitz  .M  22.5 

Gram-Schmidt 
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<i.  AUTOREGRESSION  DIAGNOSTICS  AND  ROBUST  AUTOREGRESSION 

In  this  section  we  consider  the  application  of  methods  in  ordinary 
regression  analysis  for  robustness  and  detecting  influential  observations 
in  the  autoregression *si tuation.  We  will  show  that  the  Toeplitz  Gram-Schmidt 
algorithm  provides  a  unified  framework  for  carrying  out  this  analysis. 

4.1  Detecting  Influential  Observations 

In  the  usual  full  rank,  n-observat ion,  p-parameter,  linear  model 
y  ■  XB  +  e,  the  diagonal  elements  of  the  hat  matrix  H  **  X(X^X)  ^X^  are 
often  used  (Hoaglin  and  Welsch  (1978))  to  determine  if  undue  weight  is  given 
any  particular  observation  in  finding  the  vector  of  predicted  values 
y  -  XB  -  X(XTX)_,XTy  -  Hy. 

If  X  -  QR  where  QTQ  -  D  =  Diag  (dj,...,dp),  we  have 
H  -  QR[r",D_1R_T]RTQT  -  QD"^1.  Thus 


JJ 


l  Qj t/d 

l-l  JX,/ l 


i  J = 1  >  •  •  • » n 


In  the  autoregression  case  then  these  hat  diagonals  can  be  easily 
accumulated  in  the  Toeplitz  Gram-Schmidt  Algorithm.  If  we  define 
d(k)  “  ^(k)?(k) »  k-l,...,M+l,  and 

h(l)  *  ^X(  I )  ^  2/<J  ( 1 )  ’  . T+M» 

h^j  ■  j  +  ^X(k)  ^2/^d(k)  *  j*  I  , . . . ,  T+M ,  k*2 , . . . ,  M+l , 


T+M  T 

then  *  ^(k) ' '  vk) '  are  t*1e  diagonals  of  t^e  regression  of  y M  on 

-Xj .... ,~xk_ j .  As  such  they  can  be  used  both  for  detecting  influential 
observed  y's  and  for  measuring  end  effects.  Note  that  calculation  of  the 
h^j  can  be  done  easily  by  just  caiculting  d^  and  h^  in  (2.7)  and 

<1(k>'-<k>  ln  (2-8)- 

k.2  Alternative  Estimators 

A  wide  variety  of  possibly  robust  estimators  of  autoregressive 

parameters  are  suggested  by  the  results  of  Theorem  2.1  and  Corollary  2.1, 

i.e  the  Yule-Walker  a  (k)  is  the  regression  coefficient  of  the  regression 
-  k 

through  the  origin  of  the  forward  prediction  errors  y^)  on  the  backward 
prediction  errors  Thus  any  of  the  M-estimators  (Huber  (1973)  could  be 

used  rather  than  ordinary  least  squares  to  obtain  estimators  a. ( 1 ) , . . . ,a^(M) . 

A  A 

Then  these  would  be  used  instead  of  (l ) , . . . ,o  (M)  in  (2.i)-(2.3)»  thus  giving 
what  could  be  called  £2_M  autoregressive  parameter  estimators,  f°r  the  use 
of  Levinson's  algorithm  and  M  for  the  use  of  M-estimators.  We  note  that  such 
M-estimators  probably  need  not  lead  to  a^(k)'s  that  are  less  than  one  in 
absolute  value  (A  necessary  and  sufficient  condition  for  stability  of  resulting 
prediction  filters)  but  this  may  actually  be  a  benefit  in  that  it  would 


indicate  that  the  original  time  series  is  not  stationary. 

Now  since  1  -  y|^  1  =  0  i  f  yj5j  1  «  0  we  have  also  that  a^(k)  is  the 

negative  of  the  ordinary  sample  correlation  coefficient  of  the  forward  and 
backward  prediction  errors  for  order  (k-1).  Thus  again  one  could  use  robust 
methods  of  estimating  correlation  coefficients  (Huber  (1977))  to  obtain  other 
estimators  ct^k)  • 
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J*.3  Relationship  of  the  Toeplitz  Gram-Schmidt  and  Burg  Algorithms 

An  important  method  of  autoregress i ve  parameter  estimation  is  the  Burg 
(1968)  algorithm.  With  the  framework  provided  by  the  Toeplitz  Gram-Schmidt 
algorithm  it  is  easy  to  describe  what  the  Burg  algorithm  does.  It  uses  the 
same  updating  equations  but  uses 


\(k) 


l  V 


j+k  i+k 
00  x(k) 


r[J  (i<J<io}2  +  J  <>>2] 


instead  of  ak(k).  By  (2.6)  note  that  a^Ck)  is  identical  to  ^(k)  except 
that  the  summations  range  from  1  to  T-k  instead  of  from  1  to  T+M.  Note 
also  that  a.  (k)  is  the  value  of  a  minimizing 


l.  {(Wk? +  a  x(ki>2  +  +  a  y(k^2>  » 


i .e  the  sum  of  squares  of  error  in  regressing  the  "observed"  forward 

prediction  errors  on  the  "observed"  backward  prediction  errors  and  vice  versa, 

i ,e  the  Burg  algorithm  never  uses  a  y  outside  the  range  of  observed  values. 

We  note  further  that  Stuart  (1955)  has  shown  that  if  (x, ,y. ),..., (x  y  )  is 

•  1  n ,  n 

a  random  sample  from  a  bivariate  normal  distribution  with  zero  means  and 
equal  but  unknown  variances,  then 
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j. *'y‘ 

jil  *?♦  I  yfl 

1  i*i  '  f-i  ' 


is  an  asymptotically  efficient  estimator  of  the  correlation  coefficient  of 
x  and  y.  Thus  the  Burg  estimator  of  a^(k)  can  also  be  interpreted  as  the 
negative  of  a  correlation  coefficient  estimator. 

Thus  if  k  is  large  relative  to  T  and  R(0)  is  small  the  Burg  algorithm 
may  be  inferior  to  Yule-Walker  estimation,  while  if  R(0)  is  large  it  will 
probably  be  superior  because  of  end  effects  in  Yul.e-Walker  estimation. 
Finally  we  note  that  the  a^(k)  are  guaranteed  to  be  less  than  one  in 
absolute  value. 


L 
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